[Core] Move EngineCoreRequest to Request conversion out of EngineCore #21627

linzebing · 2025-07-25T17:43:32Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

In engine core thread, we used 18us to convert EngineCoreRequest to Request which is on model forward critical path.

Ideally, we should be able to move the conversion from engine core thread to input request thread to relax the logic from critical path.

There's an extra benefit for making the change, as Request became available in input processing threads, which would significantly simply the pending changes for block hashing optimization mentioned in #21247

Test Plan

export VLLM_USE_MODELSCOPE=False;
export VLLM_TORCH_PROFILER_DIR=~/vllm_profile; # for profiling
export CUDA_VISIBLE_DEVICES=4;
VLLM_USE_V1=1 vllm serve facebook/opt-125m \
    --swap-space 16 \
    --disable-log-requests \
    --host :: \
    --dtype float16

VLLM_USE_V1=1 vllm bench serve \
    --dataset-name random \
    --model facebook/opt-125m \
    --served-model-name facebook/opt-125m \
    --random-input-len 700 \
    --random-output-len 1 \
    --endpoint /v1/completions \
    --ignore-eos \
    --host localhost \
    --port 8000 \
    --num-prompts 100 \
    --profile

Test Result

With the change, handle_client_request in engine core thread reduced from 35us to 7us.

As expected, right now, Request conversion is executing in parallel with model forward:

(Optional) Documentation Update

github-actions · 2025-07-25T17:43:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request effectively moves the EngineCoreRequest to Request conversion out of the engine's critical path, which, as shown by the profiling data, significantly improves performance. The implementation is clean and the logic is sound.

My main concern is about the robustness of the process_input_sockets thread. By moving more complex logic into this daemon thread, the risk of silent failures that could hang the server increases. I've added a high-severity comment with a suggestion to implement more robust error handling to ensure the engine fails gracefully.

vllm/v1/engine/core.py

linzebing · 2025-07-25T17:46:20Z

@Jialin has generously delegated #21329 to me as my first PR to vLLM.

@njhill : I have addressed your comments and used a tuple to represent request and current_wave. Let me know if you have further comments

vllm/v1/engine/core.py

Jialin · 2025-07-25T18:49:20Z

vllm/v1/engine/core.py

Is it an additional protection? Or we do have such error handling originally.

This is additional protection, in response to gemini's review above #21627 (comment)

I think it would be better to keep the PR focused and not introduce new error handling here.

Jialin

Looks good to me!

njhill

Thanks @linzebing this looks good to me for the most part!

njhill · 2025-07-28T20:51:22Z

vllm/v1/engine/core.py

Maybe rename this

Suggested change

def add_request(self, request: Request, current_wave: int = 0):

def add_request(self, request: Request, request_wave: int = 0):

vllm/v1/engine/core.py

njhill · 2025-07-28T21:00:19Z

vllm/v1/engine/core.py

I think it would be better to keep the PR focused and not introduce new error handling here.

vllm/v1/engine/core.py

njhill

Thanks @linzebing @Jialin!

One more thing, do you think you could also add a short comment in the code next to each of those two things confirming that they are threadsafe?

linzebing · 2025-07-29T18:05:15Z

Thanks @njhill for the review! I have added comments around thread safety.

njhill

Thanks @linzebing!

njhill · 2025-07-30T11:16:03Z

@linzebing I think some of the CI test failures are related: https://buildkite.com/vllm/ci/builds/25313#01985782-1b21-4082-9849-1642052665bc/212-1966

linzebing · 2025-07-30T14:56:07Z

@njhill : thanks for pointing out! I forgot to examine the callsites of EngineCore::add_request in the tests. Now it should be fixed:

pytest tests/v1/engine/test_engine_core.py

I manually inspected the codebase, I should have changed all callsites of EngineCore::add_request now.

mergify · 2025-07-30T16:05:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @linzebing.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: linzebing <[email protected]>

linzebing · 2025-07-30T21:08:12Z

Looks like the v1-test failure is related to CI. I tested locally, no issue:

pytest tests/v1/kv_connector/unit/test_shared_storage_connector.py

Jialin · 2025-07-30T21:30:44Z

Looks like the v1-test failure is related to CI. I tested locally, no issue:
pytest tests/v1/kv_connector/unit/test_shared_storage_connector.py

@linzebing If you confirmed the failing test is not related, you could either rebase to kick off the CI or request force merge in sig-ci slack channel.

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: x22x22 <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Noam Gat <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

linzebing requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 25, 2025 17:43

mergify bot added the v1 label Jul 25, 2025

gemini-code-assist bot reviewed Jul 25, 2025

View reviewed changes

vllm/v1/engine/core.py Show resolved Hide resolved

Jialin reviewed Jul 25, 2025

View reviewed changes

Jialin mentioned this pull request Jul 25, 2025

[Core] Convert EngineCoreRequest to Request before reaching the engine core … #21329

Closed

4 tasks

linzebing force-pushed the request branch from 72006cf to d9dd196 Compare July 27, 2025 14:15

linzebing requested a review from Jialin July 28, 2025 18:07

Jialin reviewed Jul 28, 2025

View reviewed changes

njhill reviewed Jul 28, 2025

View reviewed changes

linzebing force-pushed the request branch 2 times, most recently from 67ee54e to fdca348 Compare July 29, 2025 00:19

linzebing requested a review from njhill July 29, 2025 00:22

linzebing force-pushed the request branch 2 times, most recently from c6732fe to 1698a9d Compare July 29, 2025 00:31

njhill reviewed Jul 29, 2025

View reviewed changes

linzebing force-pushed the request branch from 1698a9d to 0e51e32 Compare July 29, 2025 18:04

linzebing requested a review from njhill July 29, 2025 18:04

linzebing force-pushed the request branch from 0e51e32 to 68a4383 Compare July 29, 2025 18:06

njhill approved these changes Jul 29, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 29, 2025

mergify bot added the needs-rebase label Jul 30, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore

2c54371

Signed-off-by: linzebing <[email protected]>

linzebing force-pushed the request branch from e888835 to 2c54371 Compare July 30, 2025 17:03

linzebing requested a review from njhill July 30, 2025 17:04

mergify bot removed the needs-rebase label Jul 30, 2025

simon-mo merged commit ca9e2be into vllm-project:main Jul 30, 2025
64 of 66 checks passed

liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

7bf95a8

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

vadiklyutiy pushed a commit to CentML/vllm that referenced this pull request Aug 5, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

8221b5f

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

3b91b17

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: x22x22 <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

0e7666d

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

de6278f

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

ff7cbc5

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Noam Gat <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

5b4dd9f

…vllm-project#21627) Signed-off-by: linzebing <[email protected]> Signed-off-by: Paul Pak <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

16259fb

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore (…

fc1fb07

…vllm-project#21627) Signed-off-by: linzebing <[email protected]>

	def add_request(self, request: Request, current_wave: int = 0):
	def add_request(self, request: Request, request_wave: int = 0):

Uh oh!

[Core] Move EngineCoreRequest to Request conversion out of EngineCore #21627

[Core] Move EngineCoreRequest to Request conversion out of EngineCore #21627

Conversation

linzebing commented Jul 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

linzebing commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jialin Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

linzebing Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Jialin left a comment

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

linzebing commented Jul 29, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Jul 30, 2025

Uh oh!

linzebing commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Jul 30, 2025

Uh oh!

linzebing commented Jul 30, 2025

Uh oh!

Jialin commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

linzebing commented Jul 25, 2025 •

edited by github-actions bot

Loading

linzebing commented Jul 30, 2025 •

edited

Loading